knitr::opts_chunk$set(echo = TRUE)

# Load libraries for homework problems
library(tidyverse)
library(gt)
library(patchwork)

# Read in COVID-19 data
# R/make_data.R creates this file
cv19 <- read_csv('data/usa_covid19.csv')

Overview

The COVID-19 pandemic is an ongoing public health emergency in the United States (US) and worldwide. Since 2020-01-21, the New York times has monitored and shared COVID-19 data (see github repo here) from across the US at the state and county level.

Data dictionary

I have modified the New York times data to include information about state’s population levels. The data are described below:

c("date" = "Date", 
  "state" =  "State in the US", 
  "cases_total" = "Total number of cases as of date", 
  "deaths_total" = "Total number of deaths as of date",
  "pop_2015" = "Estimated population as of 2015"
) %>% 
  enframe() %>% 
  gt(rowname_col = "name") %>%
  tab_stubhead(label = 'Variable name') %>% 
  cols_label(value = 'Variable description') %>% 
  cols_align('right') %>% 
  tab_footnote(locations = cells_body(rows = 5, columns = 2),
    footnote = "Source: usmap::countypop") %>% 
  tab_footnote(locations = cells_body(columns = 2, rows = 2), 
    footnote = 'US = United States') %>% 
  tab_header(title = 'Dictionary for New York Times COVID-19 data',
    subtitle = paste("Last updated:", max(cv19$date)))
Dictionary for New York Times COVID-19 data
Last updated: 2020-04-03
Variable name Variable description
date Date
state State in the US1
cases_total Total number of cases as of date
deaths_total Total number of deaths as of date
pop_2015 Estimated population as of 20152

1 US = United States

2 Source: usmap::countypop

Data pages

The data (cv19) are printed below:

cv19

Problem 1

Create two new columns in cv19:

  • cases_new the number of new cases identified on a given day for a given state.

  • deaths_new the number of new deaths confirmed on a given day for a given state.

Notes:

  • the lag() function is helpful for this.

  • Your solution should look like this

read_rds('solutions/01_solution.rds')

Problem 2

Compute the total number of new cases identified and deaths confirmed each day in the USA on or after March 1st, 2020. Your summarized data should look like this:

read_rds('solutions/02_solution.rds')

Problem 3

Using the data created in problem 2, create two bar plots showing the number of new cases identified and deaths confirmed in the USA after March 1st, 2020.

Notes This is a great chance to learn about the patchwork R package.

Your solution should look like this

read_rds('solutions/03_solution.rds')

Problem 4

Add four new columns to the data you created in problem 1:

  • cases_per100k: Number of cases per 100,000 citizens
  • deaths_per100k: Number of deaths per 100,000 citizens
  • cases_dbl_days: Number of days until case count doubles, based on current day’s case count
  • deaths_dbl_days: Number of days until death count doubles, based on current day’s death count.

Challenge yourself:

  • filter the data you have created in this problem to contain only the most recent day.

  • Identify the 10 states that have the highest death rate per 100,000 citizens.

  • Tabulate the total number, rate, and days to double for cases and deaths in each of these 10 states.

Your solution should look like this:

read_rds('solutions/04_solution.rds')
Ten states in the US with highest death rates due to COVID-19
Data presented for: 2020-04-03
Cases Deaths
Total count Rate per 100k No. days to double Total count Rate per 100k No. days to double
New York 102,870 519.7 9.2 2,935 14.8 9.4
Louisiana 10,297 220.5 8.0 370 7.9 5.2
New Jersey 29,895 333.7 5.9 647 7.2 5.0
Michigan 12,670 127.7 5.7 478 4.8 6.8
Washington 6,966 97.2 17.3 293 4.1 13.0
Connecticut 4,915 136.9 3.5 132 3.7 5.6
Massachusetts 10,402 153.1 6.2 192 2.8 4.1
Vermont 389 62.1 6.6 17 2.7 Inf
District of Columbia 757 112.6 6.3 15 2.2 4.0
Colorado 4,182 76.6 8.2 110 2.0 6.9

Problem 5

Learn something new: take a look at a famous flipbook created by Gina Reynolds. The cv19 data have a very similar structure to that of the flipbook in Gina’s talk. Learn about the ggplot2 tools that are used in the flipbook and try to adapt them to create the ‘racing bar chart’ below.